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ABSTRACT 

Linear threshold machines are defined to be thoiae whose 
computations are based on the outputs of a set of linear 
threshold decision elements. The number of such elements 
is called the rank of the machine* An analysis of the 
computational geometry of finite-rank linear threshold 
machines, analogous to the analysis of finte- order percep- 
trons given by Minsky and Papert, reveals that the pee of 
such machines as "general purpose pattern recognition 
systems" is severely limited* For example, these machines 
cannot recognize any topological invariant , nor can they 
recognize non-trivial figures "in context". 
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1. Introduction 

This paper is a contribution to "computational geometry' 
in thR spirit of the. book Ferceptrons by M. Min&ky antit 
E. Fapert [1]- That is, we seek insights ir?tO the amount of 
computation "inherently needed" to recognise various geometric 
figures* In doing so, we raise Issues about the use of 
parallel computation, analogue device $, and other pattern 
recognition techniques. This section briefly reviews the 
setting; given in [1] for &uch a study and provides an intro- 
duction to the remainder of the paper. 

By a r etina j H , we mean a collection of points t and 
by a figure on the retina some subset X c R . The size of 
t' fip. retina , | R| is the number of points in B + In studying 
pattern recognition we usually imagine R to be a finite 
set whose points are regarded as the E.quarer. in some two- 
diner; si orial plane grid and "arbitrary geometric figures" as 
(approximated by some collection of squares. (Figure 1-1, ) 
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Figure 1 



Geometric figures on a grid 



A predicate on K Is a function ■; defined for 
figures X cm H whieh Can ftSEume only the values and 
1. Examples of geometric predicates are; 

* X is a square"] 
|" JC is convex] 



|~ X c<3f-!ini:iS more than --T points] 
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(Herc, as Irt [Ij we use the notation 

["some condition] 

to mean the value fehich is 1 if the condition is true and O 
If the condition is False.) 

In computational geometry we are interested in eynthe- 
ailing "complex" predicates out of "simpler" ones. One 
measure of the simplicity of a predicate is its order. A 
predicate q? is said to be of order k if - rr-akes 
its decision by examining at most k. points of R f i*e-, 
if there exists a set E of k point & such that 

*p(X) = q>(X S) for all X c R . 



If * = [qp-, , epuj ■ » + } i6 a collection of predicates, 
then a p^e rceptron based on J is another predicate $ which 
is of the form 



»(x) = ri a i*i( x ; > & : 



where a-, jd^.. . . *,a ,n are real numbers, 
j. i n 

In other words, v perceptron is the result of a. linear 
threshold decision applied tb a weighted sum of other predi- 
cates, Th« a- are thu weights an-d fi .is the threshold. 

The ord^r of the perceptron (,■ l«5 the maximum order of 
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any of the predicates In the collection £ . Hot ice that a 
perception of order 1 is precisely what is usually called a 
linear l-hreshold function on R . 

In [ 1] Minsky and Fapcrt consider questions such as 
"What order perceptrons are necessary in order to compute 
various geometric predicates*"' They show* for example, that 

:" X is locally convex"] tan he computed with a 

perceptron of order .3 

and JX is a (discrete approximation to &) circle"] can 

be computed with order H , 

More interest ing are the results which illustrate 
fundamental limitations of perceptrons. One c*n ask if a 
predicate is of finite order , i^e., if it can be computed by 
a perceptron of some fixed order 5 regardless of the size of 
the retina, (See §1-6 of [1] for a formal definition.) 
Minsky and Papert show that such predicates as 

~ X is connected"] 

" X has at least 3 components"*] 

are not of finite order. Indeed, a main theorem of [1] states 
that the only topo logically invariant predicates which can 
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be computed in finite order are those which are functions of 
the Euler character! 5 tie (see [1J §5.9). 

Another kit^d of simple machine, the Gaiaba Perception., 
is described In [1] as a hind of "perceptron" in which each 
of the "simple predicates" ^ is Itself a linear threshold 
function: 



^(X) = i £ b l,j x jf x i > Cjl 



x,.€R 



Here x,{x) denotes the order 1 predicate 

Xj(x) - I jc ex] . 

Viewed as a percept ran, the Garaba machine | has order 
equal, to the sls^e of the retina I ctl , since each tp. looks 
at the entire retina. Hence the order restriction techniques 
of [1] do riot give much information about the capabilities of 
this kind of device. 

From another point of view, however, the Gamba machine 
is nowhere nearly as complex as the general order - |k| 
perception* Mather, it Is a simple "two-layer" device, .in 
which each layer Js made up of .linear bhrcshold elements. 
More generally, ot.c could co^iGitler "ntulLt.lrsyer machines", in 
which each layer makes linear threshold decisions tagged or, 



results of previous layers. 

This paper deals with properties of these "multilayer" 
machines, The computational devices we will ha concerned 
with are called Linear Threshold Machines . A linear threshold 
machine is a general purpose c caput er together with a nunfeear 
of linear threshold elements (p L , . .. ^ ■ The general purpose 
computer is allowed to perform any computation whatsoever, with 
one restriction - computet ions cannot be based upon "direct 
observation 11 of the retina itself, hut rather upon the outputs 
of the threshold functions qp L , ,-.,tp r ♦ (Figure 2} The 

rank f the linear threshold machine is defined to be the number 
of linear threshold functions <?-,_* .♦■ T tp r - 




Figure 2 



Linear threshold machine of rank r 
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This class of machines Includes the Gamba perceptron, 
the multilayer machines, and In fact any kind of pattern 
recognition device that can be constructed out of linear 
threshold elHinei-.ts so lon« h.^ the arrstii-sT.^it. :.;:" i ;: t-v-rcor ricctL^n:-: 
does not Include any loops. (FermiLting loops would allow one 
to build a. universal computer out of linear threshold elements * ) 

We begin j in Section 2", with a formal definition of 
linear threshold machines. Then in Section 3 we show that 
the parity predicate 

■ 

t par (X) = | the number of squares In X is odd] 

is not of finite rank. This allows usj in Section 4, to apply 
techniques of [1) to deduce that* as is the case with finite 
order perceptrons, the only topologlcally invariant predicates 
which could be of finite rank are functions of the E'jtler 
char ac t e r 1 s t i c . 

In Section 5* we begin to consider the problems of 
"Infinite" or "arbitrarily large" retinas. We introduce the 
notion of uniform linear threshold, machine, a linear threshold 
machine of fixed rank which can make computations which are 
"independent" of the size of the r-etLna* Section 6 gives some 
examples of predicates which can be computed, somewhat 
surprisingly, by uniform linear threshold machines of rank £. 

Section 7 deals with the Saturation Theorem, our main 
technique for obtaining re- strict ions on the possible computations 
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of unlforn Linear threshold machines* Section applies this 
to show that, as opposed to the finite order percept rons, which, 
can compute the Ejtler characteristic, the uniform linear 
threshold machine cannot compute any non- trivial topological 
Invariants. Section 5 gives further applications of the 
saturation technique and demonstrates the inability of these 
machines to recognise figures in content. Section 10 returns 
to give a more careful version of the Saturation Theorem and 
ahowBj for example* that if a linear threshold machine with 
"bounded coefficients "is to escape the saturation phenomenon, 
its rank must grow with the size of the retina, albeit very 
slowly {as log log j R| ), 



P. Linear Threshold Machines 

Definition 2.1 A linear threshold function qp on a retina 
R is a particular kind of predicate computed as follows; 
For some real -valued function u on R and a real number 
9 we have 



ep(x) - u(x) > ej , 



Here u Is called the measure pnd o the threshold associated 

to qj ♦ 



-9- 



liavf we combine these Tun ct ions into machines, First 
of all, a .Boolean r - tuple is defined to be en r - tuple 
each of whose elements is or 1. 

A rar.k r decision function A is a function defined 
on Boolean r - tuples and which can assume the values or 1. 
Finally, 

Definition g.g A linear threshold machine of ranfc r 

■ 

M = A* 

is a predicate consisting of 

(i) An r - tuple of linear threshold functions 

4 ■ (qj-Lpiflgj . ■ **» ) and 

(ii) A rank r decision function A such that 

M(X} = H^m^iX),..-,^*)) . 

This is the class of machines with which we will be 
concerned in this paper* The following observations are 
clearly true. 

1. If the retina has |R| points, then any predicate 
on R can be computed by a linear threshold machine of ranK 
|R| . 

P. If W-, ., tJ M. are linear threshold machlnec of 
rank r T * . * . f r respectively, then any Boulc-an function of 
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the M. can be computed by a linear threshold machine of rank 
r^ + Ttg + ♦ . ■ +■ r ^ * 

The definition (5-1) of linear threshold function la 
slightly out of line with that used by Kinsfcy and Paper t. 
When computing <p(X) we only take the sunuaation over the 
points Of X rather than over the entire retina H , Two 

alternative definitions we might have used are 

Definition 2.3 "Order 1 Ferceptron" 

where p(XjX] ia a predlcf-te depending only 
on whether or not x £X . 

Alt er natively , 

Definition 2 A "£-1,1) threshold function" 



vW = £ a x p[x,x) > e 

where p(x,X) = 1 if x e X and -1 if 



It is esmy to see that p.11 throe of the dcfl nit • on:' (*m c^ul 
valent so 1o:je as we are deal Ins with a fined finite Tfetiiui 
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ft , If j however j we consider Infinite retinas or sequences 
of retinas, the different forms {2.1}, (2*3), (2,k) make a 
difference* For example j the predicate 

| area X > ^ area ft| 

is easily expressed S3 a (-1,1) threshold function 

I p{x,X) > o] 
xfR 

but a type {2*1} threshold function for this same predicate 
must involve constants which grow large as the size of the 
retina becomes large, We have chosen to work with the form 
(3.1) since we wish to make computations which depend only on 
the figure X itself, and not explicitly on the retina R . 

Finally, there is one more assumption we will make 
about the threshold functions - that of finite sensitivity , 
that the values of the measure cannot be arbitrarily small in 
absolute value: 

2.5 Hypothesis of finite sensitivity^ With each threshold 
function there Is associated ft sensitivity e such that, for 
any x € R either y(x) - or \uM > e| . 

This hypothesis will certainly be satisfied for any 
linear threshold function built out of actual physical 
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component b* fox example, out of optical filters e.n$ photo 
detectors. 



3, Parity 

We will be concerned, as In [1], with predicates that 
can toe computed toy linear threshold machines which are 
"independent of the retina"* Our first attempt at formalising 
this concept is the notion of "finite rank". 

Definition 3*1 A predicate 4 is of finite rank t if for 

any size retina. R there is a linear threshold machine of 
rank r which computes >|r on R . 

In this section we exhibit a predicate which is not 
of finite rank. This is the "parity predicate 11 



$_ fir (^) = X contains an odd number of points of R \ 



We shall show that, for a linear threshold machine to be able 
to recognise parity, its rank must grow ftt laost logarithmically 
with the size of the retina. More precisely 

3»P Parit y Theorem . Suppose M is a linear threshold machine 
of rank r which computes parity On r r^ttna U . Then 
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| re] < rP r_1 



Before proceeding with the proof, we first intra duce 
some notation. Let M = &# 



xex 

■ 

For any X e fl let 

S ± <X) = 1 n^x) 

and let er ± (X) = ps ± (x ) s_ oj . 

Finally, let $(X) be the Boolean r -tuple 

and let E(X} be the Boolean r- tuple 

E(X) = (ff 1 (X} 1 a 2 (X) Jt ., J ff r (X]) . 

Mow recall the u&u&l Boolean notion of "Implication", 
i.o., 0^1,1-^1,0-^0 are all valid, <sut 1 -* is 
not valid, This extends to ^ p^rti*! order on Boolean r_tapl^aj 
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3*3 Definition . If a and b are Boolean r - tuples, then 
we say that & < b if a* -* b, for 1 - l JltlJ i> „ 

Our first step in proving Theorem 3-2 Is to Show that 
any linear threshold machine can be put in "normal form": 

3.4 Definition . A linear threshold machine M = a* is said 
to be normal if each component linear threshold function 
evaluates to zero on the empty set, i.e. 

■ 

1(0} = (0,0,,.. ,0) . 

This is equivalent to saying that each of the thresholds 
^ is positive* 

3*5 Normalisation Lemma . If M is a linear threshold machine 
of rank r then there is a normal linear threshold machine, 
also of rank ? t which computes the same predicate as M » 

Proof : Suppose, by reordering* that <$ASf) = 1 for 1 = l^^.jh 
and ^(0) - for fc = Jt+lj^.^r ♦ tfe will produce a new 
linear threshold machine by modifying the first h threshold 
functions. Ham sly, if 



v^W - v^W > flj 



x€X 
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is a linear threshold functlorj, define a new threshold function 
5^ hy 



<P ± 



(X) = \~Y i^(x)) >_ -e^l 



xfX 



so that <L(X) = 1 if and only if epWx} = , So now let 
W 1 be the linear threshold machine 



h 1 = i (i - v i . j a ! _ *- fc ,+ k+1 +r j 



The key observation in the proof of theorem 3.2 is the 
following "regularity condition" for linear threshold machines: 

Lemma 3 -ft * Suppose M ~ &$■ is a normal linear threshold 
machine on -i: . Suppose X ami Y, are dis.jo.Snt subsets of 
R with 3(X) ■ E{Y) . Then 
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*(X) 1 *(X U ¥) £ C(¥) . 

Pro af t We'll verify that ^(X) -*■ ^(X U Y} -* a^Y ) for 
e&ch I . 

Suppose Gl^OO = , so that S^Y) < . Then, by 
hypothesis, ^( x ) PU6t also be , so that S^X) ± $. , 
Therefore, since X and ¥ are disjoint, we have 
£ i {X u T) - B ± {X) + S i (Y) 1 fl 1 , that is, V± (X U Y) = ( 
Hence .p. (X U Y) = whenever CTj(Y) * , I.e. 
*(X UI) < E(Y] . 

Now suppose that (p 1 (xyY) = o J i*e», 



s ± (X) + s ± (Y) < ^ . [3*5) 



Then 



aaee 1 If S. (Y] <_ we have o.["i) = , by normality, 
So the hypothesis implies that ^(X) must he 0. 

case 2 If S±(Y) > then equation 3-5 Implies that 
s ± (x) <_ e 1 , i.e., 3i {X) - . 

So, in either case, ^ [ X ) *■ whenever jpjTx l.i If) ■ , 
that is, *(X) < $[X ir Y] . 
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The next is^inia applies these "regularity" cons i^er^t Ions 
'.,n the parity predicate + Firsts If v is any Boolean 
r - tuple define ones(v) to be the number of ones In v * 

LptiLT.a 3.7 . Suppose M Is a. normal linear threshold machine 
which computes parity on a. retina R , and suppose that 
Xj.x,,...^ are distinct points of R with 

Then onasfEtj^)) >, m . 



Proof: Define subsets V. of R to he 



V = k- 1 i.j Xg u . . - U x^ 



V - . 



.-. 



inc« M l.^v r.orBia.1 we hfivrt 5fV ) <_ EfXj) <. E(^) "o 
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Lemroa 3,6 applies to give 

*fV ) < *(x 2 UV C ) < E(x a ) 

that is, *(v Q ) <_ *fv 1 ] < s(x 1 ) , 

This Inequality j along with the hypothesis, now implies, that 
*( E 1? i E( x p) so once again we can apply Lemma 3.6 to obtain 

Continuing in this manner , we get 



But v i and v * n have opposite parity 50 %( v *) / *( V 4_i) ■ 

Therefore the vector *( v . ) contains at least on 5 wore '"one" 

- 
than the vector *'V- , ) and so l(X ] >_ S'V ) must contain 

at least w ones. 



C orollary 3.3 » Suppose M is ft linear threshold machine 
which computes parity on a retina E . Suppose that all the 
measure functions |i, in the threshold functions for V. take 
on only positive vaIuae^. Thun 
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rahk H 1 |H| + 

Proof : The hypothesis implies that E[X) = (l # l*,.,*l) for 
every subset of R ♦ Therefore we have 



E(x 2 ) = t{% 2 ) = ... - E ( X )H[^ 



and so Lettuna 3*7 implies that 



onea(E(x, R . } ) >_ \r\ 



t ones[£(x.| ^ }) <_ number of elements in E( x [ F i ) " rank H . 



The £p_me kind of reasoning as in Corollary 3.S provides 
the proof of the Parity Theorems 

Proof of Theorem 3»£ : 

Suppose K computes parity arid has rank r . Let 



B(r) v- y ones(v) 



v 



where the sum is taken over p.ll distinct Boolean r- tuples 
v . '^hen we claim that | Rj , the size of the retina, must 
be leas tbar. or equ^l to 3{r J . For, consider the r- tuple* 
£(jO ati x ± runs throu^n the elasrvcj-.^ of R . If |r[ > B(r) 
th^n thoTB must be sQ>nt> r-t^ple v ■•[■£ doLi-lI.;: ^mmiXu v*itfr 
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and h > ones(v) - But this la impossible* by Lemma 3-7 
Thus, |R| <_ B[r) . 

Finally, it remains only to compute B(r) : 



XJ. (the number of r- tuples^ 
ones(v) = } k j 

v y^Q (v uith dr.e&[v) = k 

r 



- I *(£) - -*" 1 • 



k o 



4. Topological Consequences of the Parity The-oreim 

This section follows Minsky and Pap&rt {[1] Chapter 5) 
very closely in deriving consequences of the fact that the 
parity predicate is not of finite r^nK^ We deduce that* 
as parity is not of finite rank, then neither are such predi- 
cates as 

I X is connected 

| X hat two components , one surrounding 



the 



otherj 



and so on + Vie will show that the only topological predicates 
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whlch could "be of finite rank can depend only on the Euler 
characteristic of X . (in fact, in Section 8* we will show 
that even these 1t Euler predicates' 1 can not be of finite rank 
if we impose certain "uniformity conditions" on our linear 
threshold machines.) 

Following Minsky and Papert, we show that any scheme 
for computing topological invariants {besides Euler 
characteristic) on a class of figures [X] must also be able 
to compute parity on a class of ''derived figures' 1 {X] . 
Hence j any machine which is "confused" by parity must necessarily 
also be confused by topological invariants. 

This notion of "predicates on derived figures" is raade 
precise by Minsky and Papert in Section 5*^ of (l]i 

■Suppose F is a function which associates to any 

figure X in R a figure X ^ F(X) in ft . Let J be a 

predicate on R , Then we can define a predicate ^ on R 
by 

t[x) = t[P(X)) - <\{X) . 

In thiR context, Minsky and P&pert formulate 

Col 1 a;:- s 1 i\;z, J [ h s or em f or Per c ep t r o ns [[J]j Theorem 5.4,1); 
Suppose the function F is such thatj each point X of R 
depends on at Kiost one point of R , i.e., tne polite, of R 
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fall Into four categories: 



x € X for all X 



or x f X for all X 



or there i& a point x € R such that 



x f X iff K U 



A A 



or x £ X iff x f X . 

Tl-iU'ri order ^ <^ order f . (That is,, if f can be computed 
by a percept iron of order it , than bo can \ . ) 

Analogous ly> we have 

Theorem -I,! (Collapsing Theornia for Linear Threshold Machines] 

Suppose thatj as above, each point x of R depends 
on at most one point of R « Then 

rank + _<^ rank g . 

#1 

That is, if f can be computed by a linear threshold machine 
of rank r then so can > r 
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Proof : Suppose M = £*■ is a linear threshold machine of 
rank r which computes ¥ on ft . Let £L , 1 = i,. 10 r be 
the linear threshold functions which comprise * . Let 
qp^. be the predicate on E defined t»y ^(X) ■* £d(F(X]) . 
Retail (2*3) that e s long as we are dealing with a fjjegd 
retina (such as R or R ) then "linear threshold functions 1 ' 
are the same as "order 1 perceptrons 11 . Thua we can apply the 
Collapsing Theorem for Perceptrons to deduce that the tp> 
Can be computed as linear threshold functions On B . Now 
define the linear threshold machine M on R by 

M = i* 



where i » & 



and 



Tjirn, for any >: ?- R 



f = (^j.p-jtpj.) 



so that M is a linear threshold machine of rank r uhich 
computes i) . 

Corollary *l.ft . The predicate 



.pi). 



f connects 



,(X) ■= I X ±e connected | 



Is not of finite rank. 



Proof: Since we have 



1, + parity ls not ° £ f i nj - te rank 
S. the Collapsing Theorem is true 

the proof is identical to the one given in the context of 
finite order perceptrons in Sections 5*5^5*7 Of [1]. Basically, 
the idea is to construct a function 

F: (figures in Ji) — * (figures in fi) 

euch that 

*parity* X * = ^connected^*^ 
and the Collapsing Theorem implies that 

rank 'parity * rar * Connected ' 
See [1] for details. 

The techniques of [1] also allow us tr> deduce that the 
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only topologically invariant predicates which could be of 
finite rank must be functions of the Euler characteristic. 

Definition i|>3 . A predicate | is said to be t onol officii ly 
invariant if + (X) = +( Y 1 whenever X and Y are topolo- 
gically equivalent (i.e„> X and Y can be "continuously 
deformed 11 into one another). 

Corollary b-k * Let f be a topological^ invariant predicate 
of finite rank. Suppose X and Y are figures witti the 
same Euler characteristic. Then f(X) ■ f(Y) , 

Proof; The proof exactly follows Theorem 5*9 of [1] which 
proves the corresponding result for finite order perceptrons. 
The idea is bjised on a construction due to Faterson which 
reduces the computation of if modulo Euler characteristic 
to the computation of the parity of certain derived figures* 
See [1] for details. 



5. Infinite Retinas; Uniform Linear Threshold Machines. 

In LierJiQiLSt rati tig that predicates such a£ parity an 3 
connectedness are not of finite rank, we considered a fixed, 
finite re tins and found lo^er bounds for the rank of any 
linear threshold machine v^hieh. computes these predicates. The 
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lower "bound becomes large as the size of the retina becomes 
large, hcif:«-i the predicates are not of finite rank. 

But the intuitive concept of "finite rank" carries 
a somewhat stronger connotation* Namely, we would like to 
think of a "finite rank" predicate as one which can be some- 
what computed by a fixed linear threshold machine which works 
regardless of the sise of the retina* We formalize this notion 
below in the definition Of uniform linear threshold machine. 

Definition 5»1 > By an "infinite retina" £ we will mean 
an increasing union of retinas 



K 1 c fi g c R 3 



A uniform linear threshold function q> on R is a 
compatible collection of linear threshold functions 



cp i (x) = fj u i (x} > e 1 ] 



L 

i i 

where u Id a measure function on H . By "compatible 

collection" we mean 

1) If R j C R 1 then v. 1 restricted to E J is the 
same as y^ , 

2 ) all the 9 are the same . 



Thus, it makes good sense* for finite figures X In 
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R j to write 



j(x) = \ y u(x) > e| 



je€X 

where u Is a well-defined function on the Infinite retina 

■*■ 
R . 

Df-: iym.it ion 5 . ? » A uniform linear threshold machine of rank r 
on H is a predicate M = if where $ = (cp, j euu, . ♦ . , m ) is 
an r-tuple of uniform linear threshold functions and & 
Is a rant r decision function. 

Intuitively* then, we allow our machines to operate on 
larger and larger retinas by hooKing lip more and more imputs 
to the linear threshold functions* The thresholds 9 as well 
as the decision function A remain unchanged. 

Notice that we make no requirement that the measure 
functions u remain bounded as the retina gets large. 

Also we could have defined our "uniform threshold 
functions 1 ' based on one of the other definitions of "linear 
threshold function 1 ', 2-3 or 2*h t These would iead to different 
classes of machines?* However, Definition $.1 seems more natural 
since, far a fixed figure on an "arbitrarily large" retina* 
the threshold BUOTttatlons need extend only over the points of 
the figure. This seems to capture the intuitive notion of 
"computations which depend only on the figure itself , not on 
the entire infinite retina". 



6, Bt ratification^ Predicates of Bank 2, 

Much of this paper 1b concern ad with proving that 
VK.r]:;u.s isometric predicates sire :.:;:. of ri^f.tc r^r.k. .-> t^is 
section, by wa y D f contrast, we show how certain "symmetry" 
predicates can be computed by uniform linear threshold 
machines of rank ?. These results are reminiscent of the 
"stratification phenomenon 1 " discussed in Chapter 7 of [].]. 
This consists, roughly, in using very large coefficients to 
encode ^cotnetr.i.c information, thus allowing certain predicates 
to be computed by simpler machines than might have been thought 
necessary. The details of this technique for linear threshold 
machines differ from those given in [1] for perceptrons* 
However 3 the results have the same flavor in both cases, and 
so we retain the naJne "stratification". 

Theorem 6*1 (Rank £ Stratification). 

Let S^S^j*.* be a sequence of disjoint finite subsets 
of K . Let ty, be the predicate 

t ± {X) = [either S i c X or S ± X - #] 

then «■ = /£ |^ can be computed by a uniform linear threshold 
machine of rank £. 

(Not.ei Each S. must itself be a finite set. But there may 
be infinitely many aistinct S, t b.) 
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Froof : For esch E 1 pick a r 'b&se point" b i £ a, ♦ Let 
n, = (number of elements In S, ) - 1 h Define the function 
u $.& follows ; 



for x £ $ A - b 1 



^{bj] = -n^ 



and., inductively, 



u[x)= m 1 = df l 



for x f= E, - b. 



M-tb^J » -"^ 



u{x)=l+ J abs[u(y)) - df Jtu 
i-1 



and u(x) = for x not contained in any S. . 
Then define 



<^(X) - i £ u(jeJ L o 



J-.6X 



tt^(X) 



n c-u(x)) i j] . 






We claim that y(X) is true if and only if $u{X) 
and ^(JC) are botb true, i.e,, tf and only if 



Y u(*! 



. 



x*x 
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To see this, note first th&t 



£ W(x) = 

xtS , 

'by choice of m , Mow If ^(x) is true uo have that either 
X ft S i = S or X n S i = . In either case, then 

1 uW = . 

xes 1 nx 

So if t(X) is true, uo have 

I u(x) = £ £ t±(x) = * 
x f X 1 x££ ± 1 X 

Conversely, suppose v(X) is false And let I be the largest 
yalue of I for which t,g(X) Is false. (He call that we are 
only concerned with finite figures X , so that I exists*) 
Then 



x€X all i x*X n S 

such that 1 

t ± (x3 is 

false 



Let Jij. = n(x) &nd A R * £ u(k) . Then 
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y |i(x) = Aj + A R , By construction of y , we have 

aba(A I ) >_ n,. sir.ce at least one point of S_ As not In X 
Also, by construction, ir T > abs(A fi ) since, if % ± W is 



false then i < I 



Thus [Ajl > |A | so Aj + A R ^ O 



Finally* notice that this construction will provide 
uniform linear threshold functions £ t * £u> on a retina, 
sequence R: R c R c « . .. We need only make sure that the 
"higher numbered M sets S. appear In the higher numbered 
retinas R J . The crucial point is that we can "enlarge the 
retina % add more ^ , without changing the value of u on 
the lower numbered S, . 



6»g Examples . 

The following predicates all have rank 2: 

(a] Draw a vertical line L* down the center of the 

retina. Define | by 

f(X] X is symmetric with respect to L | . 



Here the set? S. heve two elements consisting, of a point x. 

aloriA ui^.ii Its reflection in L , Following; through, the proof of 6.1 

we see that the weight n. •■ ?.' - 1 , 
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(b ) More generally, let G be a finite group acting 
on H , Then 

+^(30 * X is invariant under & 



has rank 2* Take the S* to be the orbits of points of R 
under the C-actionj i*e,,. 



S. = \^J g{x) for aorae x. € H 

g£& 



(e) Pick a point x e E , Then 



q[X) = X Is- a bull's-eye centered about x 



has. rank 2. Take the S, to be "concentric rinse "" abiout x 

I " Q 



7- Saturation, 

We now turn to some predicates which cannot be computed 
by uniform linear threshold machines, These include, for 
example, predicates which recognize any topological invariant 
and predicates which recognise figures; in context. 

The main technique for obtaining these results is the 
Saturation Theorem. This says, roughly, that linear threshold 
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f unctions will become "overloaded" ati the retina, becomes large. 
Consequently d parts of figures may become "invisible" to a 
linear threshold machine. We formalize this in the notion of 
ri saturation" and "saturation sequence 1 '. 



Def i n i t ion 7 , 1 Suppose that M is a uniform linear threshold 

1 2 
machine on a. retina R: R c R c wmn and that A and B 

are subsets of R with Acrft , B c R , a < b » Then we 



say that B saturates fl vfith respect to A on E a , if 



-■ 
for . any S c R we have 



M(B u S) = M(A II B U S) , 
(See Figure 3) 

Intuitively j the idea Is that B "overwhelms" the 

■A- 

decision elements of M to such an extent that M cannot 
"sec" A . 



jCi 




Fiaur* 3 

B saturates M with respect to Aon R 

DeflnUlon 7.2 Suppose A is a uniform linear threshold 
machine on fi and that {A ± c Tt*^] is a sequence of subsets 
of R * (Here {it" ') represents an expanding collection 
of retinas ir. R . ) Then we say that [A.} is a saturation 
sequence if thare axiats an integer N such that 



J^UA, U 



... U 



*N 



saturates K with reapect to A, on R 



*{!> 



The irjain result about saturation la now: 
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7_.?l Saturation Thoa^ir Let H be a. uniform linear threshold 
machine on ft and let [^ c ft 4 ^) be any_ infinite sequence 
of disjoint sets. Then [A,] contains a subsequence which 
is a saturation sequence* 

Proof: Let ft = t % and $ = (J^,..*,^) 

1 

As in 5 3 let 



■j(X) - I uj fx) 

xex 



Define the number y*(X) toy 



if M*) = 



Yj(X) = < 1 if Sj (x) > 
■1 if S (x) < 

and let r(X) he the r-tuple 

r(X) - ( Tl <X3.Y ? (x),;*.>v r (x)) . 

Since there are only a finite number of possible values for 
F(X) j, there must be an infinite subsequence of the (A.} for 
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whlch r(A. ) takes the same fixed value, We claim that thie 
la the desired saturation sequence* 

To prove this, first renumber the A. f £ so that 

A j A f . . . la the subsequence picked out above* Also for 

_i 1 

convenience renumber the the B "s so that A ± c H . Mow 

let 



M = max max abE(S.(X)) 



Let 



T = max abe 9* 

J = -'-t ■ ■■ ■ r -T 



T + M 
and choose N > — - — -I- 1 



where e is the minimum sensitivity of the linear threshold 
functions £, j . * * * i * (Recall §2^5+] 

Let A - A ar.d let B = A^ U A. \\ . . . I) A H and let 
S be any subset of B 1 , We will show that 

^ (A u B M Si = ^[B rl SI 

for j = 1 M 2,*+.T • This will prove the theorem. 



Cr 



s; 1 , Suppose we have a J for which S *( A ) = Q > H«"ee 
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Ej(B) and SJA g B) are also , EO 



Sj(A U S y S) = SjfS) - Sj(B U S) 



ana therefore £ (A y B g S] ^ ^.(B u S) 

■■■ 



Case_J: P Suppose S,(A) > O . Then, oy (2.5) vie h&ve S,{A) < e 

■-■ 
and so 



SjtB) 2_ (H - 1)g = T + M 
also, "by choice of M we have 

[ S 4 ( s ) I <_ H to S.{B N 8) :■ T > 8. 
J ''J 

and S,(A U S y S) > T + e ? G, . 

Hence J, (A uByS) = JJB u S) * 1 , 

_Case_j.. Suppose S, (A) < „ Then, as above, we have 

Sj(A) < -e and 3^(3) < -{U-l) e = - r - M 

Also j 3,(3)1 < H * so S.(B 1! S) < -T < 9. 

■J J ~~ J 



and S,(A U 3 U u) <. -T - e < e 
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Hence $j(^ U B u 3) = ^[E y S) = O 

This completes the proof- 

To- use the Saturation Theorem we proceed as follows: 
first find a figure X which we would lifce to make "Invisible" 
to V. , then embed X in a saturation sequence (X^) so that 



X, U X U .. . U X N = df sat(X) 



saturates M with respect to X . The following proposition 
illustrates the technique; 

Proposition T.fr Lot t ad « fce the predicate 

^ HH -(X) = ' X contains at least 2 adjacent points 
ac j 



Of E 



1 



Then i' cannot he commuted by a uniform linear threshold 
ad j 

snachlne* 



Proof : Suppose M is a uniform linear threshold machine. 
Let {A,) be a sequence of single points of R , spaced at 
least 3 aptirt. By Theorem ?.ji [A.I contains a saturation 
subsequence B^B-,,.,. Then, a. c ; ir.riicEL'jed above, lr;w 
E = D, c R 1 and let & 2 !i B, m ... u H H - sat(B) saturate 
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M with respect to B on R . Mow let E be the figure 
consisting of a single point of R 1 3 adjacent to B but 
not adjacent to any of the other B . Then, by saturation, 

fifaatfB) U 3) - Jl(B U sat('B) u S) 

but '* a dj{ sat W y £ ) is false while t adJ (B U sat(B) y a] 
is true. (See Figure 4) 











- m 






S B 


R 1 



B, 




B- 







B n 



Figure 4 
Saturation sequence for ^ 



Remark: This proposition stands in sharp contrast to the 
perceptron case, where )^ is easily computed by a percept ron 



of order 2 
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3* Topological Invariants. 

We have already seen (k.h) that the only topological 
invariants which could be computed by a finite rank linear 
threshold machine are those which depend only on the Euler 
characteristic* Now we apply the Saturation Theorem to 
conclude that not even these "Buler predicates" are computable 
in a uniform way* and that consequently uniform linear threshold 
machines cannot compute any non-trivial topological Invariant* 

Thgor&ei 8 . 1 Suppose £ is a uniform linear threshold machine 
such that M^) = M(X 2 ) whenever X x and X 2 are topologicals 
equivalent* Then, in fact* M(X) = M[Y) for any Hon -empty 
sets X and Y . 

Proof : Let T denote the annulus illustrated in Figure 5# 

Let X be any non-empty figure. Vie will &how that fl(X)=K{T) * 




Figure 5 



The standard annuJui, T 



Step 1: Let e(X) be the Euler characteristic of X . By 
applying ^.-4 and choosing the retina large enough 
we have that J>1(X) is equal to one of the "canonical figures 
with Euler characteristic eCX)", i.e. 

e(X) disje-int squares if e(X} > 



or 



a 1 - e{X] ■ holed arm-jlus if e(X) <^ 

(see Figure 6). Thus we need only show that fi{x) = M[T) 
for X equal to any of these canonical forms. 



a(x>> 




k£i* 




; , r H . l , , t 



TO 
Il8 



eCx)< 




Figure 6 



Canonical figures for Euler characteristic 



Step 2. 



In the retina sequence K 



choo&e a sequence of dia joint copies of T . Bow use the 
Saturation Theorem 7-3 to *i nd a sequence T^/T^ . . . j T N so 
that 



T & U 



U T 



:■; 



jatfT-j^) 



saturates M with respect to T 2 on R . (See Figure 7, J 
Notice, that by (^*0 we have M(X) = H(X y sat^)) since 
these seta have the same Bular characteristic. 









Figure 7 



A saturation saquvnce of annul I 
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form for X is an j^holed annulus. Consider the set 
X u fla t( Tl J . By topological invariant We Can defor „ 
K u sat(T a ) without charts the value of & So that the 
"end-pQBitioa hole* of X move* OVer to h ^ m& T± {piftU „ 
9). I.e., X-XUT X where I has (n-1) holes. 





Figure 8 



X da for ma to XuT| 



Thus we have 



K(X) = M(X U satfT^}) * M(X U T- U sat(T )] 



= M(X U flatty) ) = £[X) , 

Proceeding Indue tively d we can reduce the number of holes of 

X one by one until there is only one hole leftj i.e.j X 

reduces to an annulug.. 

Case 2: Suppose that e(X) > so that the canonical form 

for X Is n disjoint squares. Let s denote the "end 

most" square and X = X u a . Consider aftfiin t,he set 

X u Bftt(T^) . Once again the value of W ie unchanged if we 

deform this set by moving e Over to he adjacent to the 

position occupied on the retina by '[' . {See figure 9.3 



rv " 



HOB EH 
pas 69 



^ 
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S„ ^ 



Fjguf* 9 



XUT| rfaformi to 3tu3 n uT| 
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Thus 



M(X) - M(X U sat^)) = K{X U S n Lj sat(T 1 )) 
= M(Y u s n il T L «J satf^)) 

where the last equality uses saturation to add in T, . But 

if s„ is directly adjacent to T, then the set s U T. Is 
n j. n x 

itself tope-logically an annulus and so has Euler characteristic 
sere , Thu4 

M(X TJ b U T LP sat^)) = M[7) (as long as X / O ) . 

Proceeding In this way^ vre can eliminate the squares of X 
one by one. 

Thie completes the proof* 



9i Figures In Context. 

We re cell the following definition from 56,6 of [1], 

Definition 9+1 If 4 is a predicate then define a new pre- 
dic£Lte *ln context h * 
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*±n context^' : 



h(Y) for some connected 
component Y of X 



Pape-rt and Minskv show that, for SUCh predicates as 

X is a hollow square) +, onte?; + cannot be computed by 
a finite order perceptron* In this section we show that 
uniform linear threshold machines can compute ji, . , 
for only the most trivial kind of predicate i) . 

■ 

Definition 9-? We say that a predicate g Is divisible If 
t satisfies the following condition: For every connected 
set X on which | is true, if we divide X into two 
disjoint connected sets X r* A *\ B , then *(A) Is true or 
((B) is true. 

We can see that most "interesting" geometric predicates 
are not divisible* For cxanple, if t(X) is true arsd <jr 
is divisible* then by continual subdividing we see that q 
must be true on the set consisting of one single square of X . 
Consequently, any predicate which is both divisible and translation 
invariant must be true on the figure consisting of a single 
square, TCok all predicates which are true on single squares 
are divisible. Figure 10 , for example* illustrates that 
X iy a square! is not divisible * 
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Flgurft 10 
|X Is a square is not divisible 



Theorem 9^3 Suppose K is a uniform linear threshold machine 

which computes 4 context for s0me translation invariant 
predicate i" , Then + must be divisible. 



Proof : Suppose Q is not divisible. Then there exist 
connected figures A end 3 such that X = A u B is connected, 
(j(/l) = iJ(b} and + (X) = 1 + Choose a saturation sequence 
[B-S Of sets which are all congruent to B and translate A 



so that A '. I E, is congruent to X 



(See Figure n,> 



-w- 







FigUfe ft 



Saturation aequence for ^ n Coritintt 



Then we have t( B j) = ° for *- = 1*---#H 
B ? U ... U B„ = s-at{B 1 ) we have 



Letting 



*ln contort < A u ^C»i)l =0 



ar-3 



*ln context^ L ' B I U ^(B L )] - t ln context (X u sat^)) = 1 



Or. the other hond, w« must h-ve 



ft(A Ll nat(S 1 >) - fj[A U B^ " 5At{B 1 )) 
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The negation" of ^ eontext is eiven by 
^pll^^ = + C0 ^ or a H connected components Y of X | 

we leave it to the reader to formulate and prove the 
corresponding theorem for tin * e + G-,i 

T]fX) = I :rvcry .-.rsmpon^nt :■! X 1l, a rec:lan,:;le | 

cannot be computed by a uniform linear threshold machine. 



10. Bounds for Saturation* 

We have shown that uniform linear threshold machines 
which purport to recognize even very simple predicates must 
eventually fail on arbitrarily large retinas. But hovi large 
is arbitrarily large? This section provides a bound, albeit 
a rather weak one, in terras of constants associated with the 
machines. 

De;f i n i t i on 10 . 1 Let M ^ &$ be a uniform linear threshold 
mac hi L:'.' of ranK r On ^ = 
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Let S ± (X) = £ ^(x) , let S(X) = max |S i (X)| 

x£X l=l*.**,r 

let T - max | @, \ , and let e tie the minimum sensi- 
l-l^.^r 1 

tlvity £Sp5) of the l £. « Wow define sequences m(l) and 
b£i) by 



m[0) = max S(X) 
Xcft 1 



>{i) = ^°> + t + 1 



arid j Iriductivelyj 



m(n) = max S(X} 
XcR b(n) 



b(n+l) = b[n) + ^ifra{nj + T} 



Finally, let H(Ml = b(3 r ) . 

Theorem 10 .P L&t H bo ar, above unci let. I A, t" fl' ) br: any 

sequence of Jls joint seta. l*hen there; exist, wnariE* thii first. 
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N(M) terms of the sequence* a set A = some A c R ' and 
& set B = (union of A. ] such that B saturates M with 



respect to A on R 



In other words, for the purposes qf applying the 



Saturation Theorem, ^W ls '"arbitrarily 1 



arge , 



Proof: If we examine the proof of the Saturation Theorem (T-j) 



« n 

we see that we Can saturate Vi if we can find some A.. i= R 



and N more A'c A.^yA.^w,..^.^! sueh th&t 

■ 

1) All the A. have the same vector r(A, ) 

2) N > M * T where M * max |S{Xj| 

e n 

Since there are 3 r possible values for r[A ) the theorem 
will follow at once from the Lemma belowj if we choose p = 3 
and f{x) * r(X) . 

Lenwa 10*3 Let f be e function, from subsets of H to th& 
set of integers ll,£j.**±p] . Then from any sequence of 
o(p) subsets of R we can extract sets Ac R n and 

A UiyH{2y-**h{K) with 

1) f(A) - r ( A i(i)l = ••• = f < A l( H p 

£) H > M f T M = max 1S(X)| . 
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Proof i By Induction on p ♦ 

Firsts if p = 1 t then all A r s in the sequence 
automatically have the same value f(A) t so choose A ■ A, c; R 
and condition (?) is fulfilled since b(l) > m ^ h T . 

Novr we assume that the lemma is true for p and prove 
it for p + 1 . Suppose we have a sequence of b(p + l) 
elements, with f taking values in the set [l r Q t « . * j. p-i-1 ) . 
Ej-^^k the sequence into two pieces? 



the first b{p) elements 



and 



the remaining ^— {m(p} +■ T) elements 



If f applied to the first piece takes on at most p 
distinct values then the lemma follows by induction* 

Otherwise f takes on all p +■ 1 possible values among 

the first b{p) elements. Now among the second group of 

elements there is some value *?bich f assumes at least 

^tPLi-± times* But there is also some set A a?iong the 
c 



first b(p) elements on which f assumes this value. So let 
the desired sequence be A c ft ^ b ' followed by the remaining 
j£l£i f . „ elements selected from the second group. 



Example 10 ■■■'r Suppose that the measure functions u* used by 
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ft are bounded, i,e + J |u<(k)| < k ^°r * £ K . Suppose also 
that the size of the retinas B n grows linearly with n , 
]R n | - en . (This is sufficient for all applications of the 
Saturation Theorem in Sections 3 ana 9,) Then we can estimate 
N(fl) : 



m(n) m ke b(n) 
*{n+:) = b{a) + ("^ (fcc b(n) i T] 



or T>{n+l) - b{n) = S±i[ Kc b{n) + T] 



and we can estimate the growth of b(n ) tiy considering the 
differential equation 



dx = C xy + C , 
dx 



This bas the solution 



y = AC £ e - ^q 



and s-o we find 



log t> [ n) -* n ' 



Finally CT(tt) - b(3 r ) so we get 
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log log PT(M) *- r . 

Corollary KK5 : For a "bounded linear threshold machine to 
avoid being saturated on large retinasj the rank must grow 
at least as fast as log log | R| ♦ 



ll a Conclusion. 

It Is Instruct Ive to compare the results of this paper 
with, those of [1]. Kinsley and Papert demonstrated Limitations 
of percept pons of small order f providing mathematical Justi- 
fication for the intuition that these computational schemes 
are somehow too "local 1 ' to deal with such "global 1 " predicates 
as connectivity. Here we have taken a complementary point of 
view^ investigating the limitations Of the linear threshold 
element itself as a decision element. 

Like Minsky and Fapert we believe that the value of 
this work Ilea in the general phanotnena that it illuminates 
rather than Xn the precise statements of the theorems* In 
our ease, ue have shown that Minsky and Papert's. "stratification 
phenomenon nt appears In the class of linear threshold machines 
as well as In psreeptrons . We have also indicated the Importance 
of saturation ns a potential pitfall for any machine attempting 
to recognize patterns using only a email number of threshold 
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elements. 

Hopefully, all of these results will someday be 
subsumed by a general mathematical theory of pattern recog- 
nition, a theory which will clarify the intuitive guess that 
any system for "general purpose" pattern recognition must 
have the ability to "focus in on" local features and also the 
ability to combine this local data in flexible "global" ways. 
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